true

Introduction

This study was born out of both curiosity and necessity. One of our teammates faced higher than expected energy bills this winter and was suddenly faced with the question of whether to shiver to save money or spend it on heating bills and be forced to eat Spaghetti-O’s as sustenance. While spring has sprung and summer is on the horizon, we know that winter is coming and that we should prepare now in order that history not repeat itself.

As students recently armed with the tools to conduct time series analysis and forecasting, we realized we could complete our final project while aiding our teammate with knowledge for the future. Our study question thus emerged: would it be more cost-effective to heat an apartment in North Carolina using electricity or a natural-gas powered heater?

Full of optimism that we could complete a class requirement while doing some good for the world (for as Marvel taught us, when you help someone, you help everyone), we set out to find the data that would lead us to the answer we sought. Our journey led us to that great repository of energy knowledge, the US Energy Information Administration. There we found two datasets we felt confident would help us help our teammate: “North Carolina Price of Natural Gas Delivered to Residential Customers (Dollars Per Thousand Cubic Feet)”, which contained monthly data from January 1989 through January 2023, and “Average Retail Price of Electricity by State and Sector” which contained monthly data from January 2001 through January 2023.

Data

Both datasets were downloaded from the EIA as .csv files. The most significant data wrangling was done with the natural gas data as it was originally provided in dollars per thousand cubic feet and needed to be in dollars per kilowatt hour for comparison with electricity data. The electricity data was provided monthly for both regions and states by sector. The natural gas data was provided monthly for North Carolina accompanied by price.

Table 1: Summary Statistics
Variable Observations Min Max Mean Price Std. Dev. Median Price
Natural Gas ($/Mcf) 410 5.54 30.43 13.78 5.64 12.54
Electricity(cents/kWh) 265 7.53 13.51 10.22 1.30 10.41
Sample of Cleaned Natural Gas Data
year price
Jan-1989 6.17
Feb-1989 6.30
Mar-1989 6.29
Apr-1989 6.80
May-1989 6.99
Jun-1989 8.02
Jul-1989 8.71
Aug-1989 8.97
Sep-1989 8.68
Oct-1989 7.44
Sample of Cleaned Electricity Data
my_date price_per_kWh
Jan.2001 7.53
Feb.2001 7.77
Mar.2001 8.02
Apr.2001 8.00
May.2001 8.22
Jun.2001 8.19
Jul.2001 8.31
Aug.2001 8.35
Sep.2001 8.35
Oct.2001 8.67

Analysis

Our analysis began by creating initial time series objects and plotting them, along with creating ACF and PACF plots to gain an initial sense of what the series looked like and what seasonality they may have.

We then decomposed the time series objects for further analysis to visualize the trend and seasonality of each series. The decomposition of the natural gas time series object shows clear seasonality in price along with a steep upward trend beginning in 2020. The decomposition of electricity prices shows a clear upward trend from the beginning of the series and shows a bimodal seasonality.

Several models were developed and tested to determine what would best fit the data we had and what may lead to the best forecast to determine whether heating via electricity or natural gas would be most cost-efficient in the upcoming winter. The five that were used were the Seasonal ARIMA, ARIMA with Fourier terms, Neural Networks, TBATS, and STL + EST.

Each of these tests used functions from the “forecast” library and required the use of time series data which we created using the “ts” function from the “tseries” library.

While the Seasonal ARIMA was a logical first model to try, we quickly felt that the performance was not particularly strong and that developing and testing other models was warranted.

We decided to explore more advanced models such as the ARIMA with Fourier Terms model, Neural Networks, TBATS, and STL + EST. ARIMA with Fourier terms is known as a dynamic harmonic regression model with an ARMA error structure, using the “fourier” function from package “forecast” to find terms that model seasonal components.

   

We next developed and tested an STL model.

   

   

We then developed and test a neural network model using the “nnetar()” function in the “forecast” package. We learned that the p and P arguments in nnetar() have significant impact on model performance. Therefore, we worked to identify the optimal p and P combination through trial and error.

   
Natural Gas Neural Net Forecast Accuracy using Various Seasonal Lag Inputs (p/P)
ME RMSE MAE MPE MAPE ACF1 Theil’s U
1/0 -1.17118 1.29508 1.17118 -10.50142 10.50142 0.28456 45.32960
1/1 -0.87400 0.98737 0.87400 -7.61678 7.61678 0.18957 4.79878
2/0 -1.03699 1.16635 1.03699 -9.18152 9.18152 0.29320 18.63485
2/1 -0.88576 1.00183 0.88576 -7.73129 7.73129 0.21320 4.81599
2/2 -0.86795 0.99894 0.86795 -7.57857 7.57857 0.23490 4.52403
1/2 -0.87328 0.99412 0.87328 -7.61742 7.61742 0.19827 4.68371
3/1 -0.89571 1.01289 0.89571 -7.82631 7.82631 0.21960 4.89769
Natural Gas Neural Net Forecast Accuracy using Various Seasonal Lag Inputs (p/P)
ME RMSE MAE MPE MAPE ACF1 Theil’s U
1/0 -2.29033 3.00294 2.39532 -33.52841 34.93620 0.62861 18.47518
1/1 -1.78457 2.23720 1.96980 -25.47780 27.70608 0.27788 2.42027
2/0 -2.50655 3.28959 2.68212 -40.38153 42.60932 0.67867 11.45238
2/1 -1.88117 2.21345 1.95136 -27.37077 28.11068 0.23744 2.52182
2/2 -2.01551 2.36220 2.06354 -30.12179 30.63531 0.26592 2.69001
1/2 -1.92229 2.32882 1.98813 -27.91104 28.60704 0.29740 2.65386
3/1 -1.90599 2.23197 1.95937 -27.95908 28.52778 0.23503 2.53914
3/2 -1.88861 2.23250 1.95109 -27.81592 28.47774 0.27029 2.55404
33 -2.05670 2.37744 2.16224 -31.14725 32.23551 0.18669 2.38939

   

We found that combination 11 (p = 1 and P = 1) had the best modeling performance for our electricity series, and combination 32 (p = 3 and P = 2) had the best modeling performance for our natural gas series. We then used combination 11 to run a neural network model with fourier terms for our electricity series.

   

   

We then developed and tested a TBATS models for our electricity and natural gas series.

   

   

   

    As we thought about other ways to model our time series, we realized that the Ukraine War has had a significant impact on natural gas prices, and likely eletricity price, too. We additionally realized temperature would be a good regressor to include since utility bills normally fluctuate in the same direction as temperature. Therefore, we created two covariates: UKRWAR and temperature. \(UKRWAR\) is an indicator variable with values of 0 and 1. Months before March 2022 have a value of 0, while months after and including March 2022 have a value of 1. The reason why we set the cutoff month at March 2022 despite the war starting in February of that year is because the impact of the war on monthly natural gas prices in February 2022 should be limited because of the war beginning that month. The temperature series is the monthly average temperature of the Raleigh area. This is largest geographic level of historical temperature data.    

   

After creating all the covariates, we repeated our modeling but with covariates to improve the accuracy of the models. First we incorporated covariates into our neural network model.

   

   

   

   

We then developed a seasonal arima model with temperature and fourier terms to model two series. UKRWAR as a covariate was excluded because R reported no suitable ARIMA model when UKRWAR was included. The function used for this was “auto.arima(xreg)”.

   

   

   

   

Summary and Conclusions

When comparing the accuracy of each of the models that were tested, the STL model emerged as the best fit for both the price of electricity and the price of natural gas when comparing both the RMSE and MAPE scores. Both models missed the heights seen in 2022, however. There could be several reasons for this. One is the difficulty of modeling for the beginning of the war in Ukraine and the impact that has had on natural gas prices both regionally and around the world. Another is the difficulty of modeling for the influence of generationally high inflation. With both of these compounding each other, it is perhaps not surprising that all of the models failed to predict the rise in price for both electricity and natural gas.

   

## The best model for electricity by RMSE is:  STL
Forecast Accuracy for NC Residential Electricity Price
ME RMSE MAE MPE MAPE ACF1 Theil’s U
SARIMA -0.67702 0.83764 0.70998 -5.83726 6.12612 0.25988 2.73983
ARIMA with Fourier -0.69902 0.87542 0.69902 -6.04752 6.04752 0.08221 2.82856
STL -0.58551 0.76708 0.63447 -5.01390 5.43949 0.25835 2.36056
Neural Network -1.04560 1.20985 1.04560 -9.37110 9.37110 0.16600 3.02200
TBATS -0.74971 0.89063 0.75658 -6.50522 6.56626 0.23462 2.86546
Neural Network with Covariates -0.96011 1.14025 0.98911 -8.57773 8.83244 0.41622 2.70566
SARIMA with Tem and Fourier -0.67930 0.81795 0.69585 -5.83257 5.97891 0.26220 2.92123
## The best model for natural gas by RMSE is:  STL
Forecast Accuracy for NC Residential Natural Gas Price
ME RMSE MAE MPE MAPE ACF1 Theil’s U
SARIMA -1.67902 1.97784 1.67902 -23.35174 23.35174 -0.15535 1.81675
ARIMA with Fourier -1.73518 2.31200 1.83981 -25.17044 26.24995 0.57850 3.12541
STL -1.04788 1.35599 1.11314 -14.03428 14.80904 -0.33348 1.28849
Neural Network -1.91617 2.25890 1.97809 -28.12139 28.77757 0.44954 2.70104
TBATS -1.72858 1.89245 1.72858 -24.53539 24.53539 -0.32231 2.08610
Neural Network with Covariates -1.20667 1.73487 1.45109 -15.21679 18.23565 0.49686 2.00937
SARIMA with Tem and Fourier -1.63275 1.79495 1.63275 -21.52016 21.52016 -0.01071 1.95240

   

Our initial research question posed how our roommate could best plan for surviving, if not thriving, in the upcoming winter. Based on our analysis, the answer appears to be that he would be best off by buying a natural gas-powered heater. Given the vicissitudes of energy prices seen in recent months, and the uncertainty of global and domestic events that could further impact energy prices, this is hard to say with certainty. Revisiting these forecasts both ahead of the upcoming winter and annually thereafter with updated data would be well-advised.